Skip to content

Fix age histogram to track allocation lifetime instead of age at allo…#40

Merged
saimizi merged 1 commit intomainfrom
fix/age-histogram-lifetime-tracking
Oct 10, 2025
Merged

Fix age histogram to track allocation lifetime instead of age at allo…#40
saimizi merged 1 commit intomainfrom
fix/age-histogram-lifetime-tracking

Conversation

@saimizi
Copy link
Owner

@saimizi saimizi commented Oct 10, 2025

What's Changed

Overview

This document describes the solution to fix the age histogram feature in Statistics Mode. The solution calculates allocation lifetimes at free time in BPF, and conservatively counts unfreed allocations in userspace.

Problem Summary

The current implementation has two issues:

  1. Histogram is updated at allocation time when age is always ~0 (all go to 0-1min bucket)
  2. Histogram is never decremented when allocations are freed

See issue #39 for detail

Solution Approach

Key Insight

Since Statistics Mode doesn't preserve individual allocation records, we cannot calculate the exact age distribution of currently unfreed allocations. However, we can provide valuable insights by:

  1. In BPF: Track the lifetime of allocations when they are freed (age at free time)
  2. In Userspace: Conservatively add all unfreed allocations to the longest age bucket (30+ min)

This gives us:

  • Accurate lifetime distribution for freed allocations
  • Conservative estimate for long-lived unfreed allocations
  • Complete picture of allocation age patterns

Histogram Semantics

After this fix, age_histogram[4] will represent:

AGE_RANGE_0_1MIN   (0-1 minute):   Count of allocations freed within 1 minute
AGE_RANGE_1_5MIN   (1-5 minutes):  Count of allocations freed between 1-5 minutes
AGE_RANGE_5_30MIN  (5-30 minutes): Count of allocations freed between 5-30 minutes
AGE_RANGE_30MIN_PLUS (30+ minutes): Count of allocations freed after 30+ minutes
                                    PLUS count of currently unfreed allocations

The last bucket includes unfreed allocations because:

  • They are long-lived by definition (still in memory)
  • This provides a conservative estimate
  • It highlights potential memory leaks or legitimate long-lived allocations

Implementation Details

Part 1: BPF Changes (malloc_free/bpf/malloc_free.bpf.c)

Change 1: Remove histogram update from allocation time

Location: Lines 284-286 in update_age_statistics()

Current code:

static void update_age_statistics(struct malloc_record *record,
                                  u64 alloc_timestamp_ns)
{
    // Update oldest allocation timestamp
    if (record->oldest_alloc_timestamp == 0 ||
        alloc_timestamp_ns < record->oldest_alloc_timestamp) {
        record->oldest_alloc_timestamp = alloc_timestamp_ns;
    }

    // Update running totals for average age calculation
    record->total_unfreed_count++;
    record->total_age_sum_ns += alloc_timestamp_ns;

    // Update age histogram if needed
    u32 age_range = calculate_age_histogram_range(alloc_timestamp_ns);  // ❌ Remove this
    record->age_histogram[age_range]++;                                  // ❌ Remove this
}

New code:

static void update_age_statistics(struct malloc_record *record,
                                  u64 alloc_timestamp_ns)
{
    // Update oldest allocation timestamp
    if (record->oldest_alloc_timestamp == 0 ||
        alloc_timestamp_ns < record->oldest_alloc_timestamp) {
        record->oldest_alloc_timestamp = alloc_timestamp_ns;
    }

    // Update running totals for average age calculation
    record->total_unfreed_count++;
    record->total_age_sum_ns += alloc_timestamp_ns;

    // ✅ Histogram will be updated in uprobe_free() when allocation is freed
}

Change 2: Remove histogram initialization at first allocation

Location: Lines 482-488 in handle_alloc_return() (new record creation)

Current code:

// Initialize histogram
for (int i = 0; i < 4; i++) {
    new->age_histogram[i] = 0;
}
u32 age_range = calculate_age_histogram_range(timestamp_ns);  // ❌ Remove this
new->age_histogram[age_range] = 1;                             // ❌ Remove this

New code:

// Initialize histogram
for (int i = 0; i < 4; i++) {
    new->age_histogram[i] = 0;
}
// ✅ Histogram will be populated as allocations are freed

Change 3: Add histogram update in uprobe_free()

Location: After line 633 in uprobe_free() (Statistics Mode section)

Current code:

} else {
    // Statistics Mode: Update malloc record with actual freed size
    // Get the actual allocation size from the event
    u32 actual_size = event->size;

    // Look up malloc_record by PID (per-process tracking)
    struct malloc_record *entry =
        bpf_map_lookup_elem(&malloc_records, &pid);
    if (entry) {
        // Update free_size with ACTUAL bytes freed (not just a counter)
        entry->free_size += actual_size;

        // Update age statistics when memory is freed
        u64 current_time = bpf_ktime_get_ns();
        update_age_statistics_on_free(entry, current_time);

        bpf_map_update_elem(&malloc_records, &pid, entry,
                            BPF_ANY);
    }

    // Delete the event to save memory
    bpf_map_delete_elem(&malloc_event_records, &key);
}

New code:

} else {
    // Statistics Mode: Update malloc record with actual freed size
    // Get the actual allocation size from the event
    u32 actual_size = event->size;

    // Look up malloc_record by PID (per-process tracking)
    struct malloc_record *entry =
        bpf_map_lookup_elem(&malloc_records, &pid);
    if (entry) {
        // Update free_size with ACTUAL bytes freed (not just a counter)
        entry->free_size += actual_size;

        // Update age statistics when memory is freed
        u64 current_time = bpf_ktime_get_ns();
        update_age_statistics_on_free(entry, current_time);

        // ✅ NEW: Update age histogram based on allocation lifetime
        u64 alloc_timestamp = event->alloc_timestamp_ns;
        u32 age_range = calculate_age_histogram_range(alloc_timestamp);
        entry->age_histogram[age_range]++;

        bpf_map_update_elem(&malloc_records, &pid, entry,
                            BPF_ANY);
    }

    // Delete the event to save memory
    bpf_map_delete_elem(&malloc_event_records, &key);
}

Part 2: Rust Changes (malloc_free/malloc_free.rs)

Change: Add unfreed count to histogram display

Location: Find where age histogram is displayed (if implemented) or prepare for future display

When displaying the age histogram for a process:

// Read histogram from BPF malloc_record
let mut histogram = record.age_histogram;

// Add unfreed allocations to the longest age bucket (30+ min)
// This is a conservative estimate - if they're still unfreed, they're long-lived
histogram[AGE_RANGE_30MIN_PLUS as usize] += record.total_unfreed_count;

// Now display the histogram
println!("Age Distribution:");
println!("  0-1 min:    {} allocations", histogram[AGE_RANGE_0_1MIN as usize]);
println!("  1-5 min:    {} allocations", histogram[AGE_RANGE_1_5MIN as usize]);
println!("  5-30 min:   {} allocations", histogram[AGE_RANGE_5_30MIN as usize]);
println!("  30+ min:    {} allocations (includes {} unfreed)",
         histogram[AGE_RANGE_30MIN_PLUS as usize],
         record.total_unfreed_count);

Note: The exact location depends on where histogram display is implemented. If not yet implemented, this logic should be added when the feature is exposed to users.

Part 3: Documentation Updates

Update malloc_free.md

Add section explaining age histogram semantics:

### Age Histogram (Statistics Mode)

The age histogram tracks allocation lifetime patterns:

- **Freed allocations**: Counted in buckets based on how long they lived before being freed
  - 0-1 minute: Short-lived allocations
  - 1-5 minutes: Medium-lived allocations
  - 5-30 minutes: Long-lived allocations
  - 30+ minutes: Very long-lived allocations

- **Unfreed allocations**: All counted in the 30+ minute bucket (conservative estimate)

This provides insight into:
- Memory usage patterns (mostly short-lived or long-lived allocations?)
- Potential memory leaks (large counts in 30+ minute bucket)
- Allocation lifecycle behavior

**Example Interpretation:**

Age Distribution:
0-1 min: 1000 allocations (frequent, short-lived - good)
1-5 min: 50 allocations (medium lifetime)
5-30 min: 10 allocations (longer lifetime)
30+ min: 100 allocations (includes 90 unfreed - investigate)


If the 30+ minute bucket is large, check `oldest_age` and `avg_age` to understand
the unfreed allocations better.

Benefits of This Solution

  1. Works in Statistics Mode: Low memory overhead, no need for Trace Mode
  2. Accurate for freed allocations: Lifetime calculated at free time with real age
  3. Conservative for unfreed allocations: Highlights long-lived allocations
  4. Simple implementation: Minimal code changes
  5. Complementary to existing metrics: Works alongside oldest_age and avg_age
  6. Actionable insights: Helps identify memory usage patterns and potential leaks

Testing Plan

After implementation:

  1. Test short-lived allocations:

    # Program that allocates and frees quickly
    sudo ./target/debug/malloc_free -d 10
    # Expected: Most counts in 0-1 minute bucket
  2. Test long-lived allocations:

    # Program with memory leaks or long-lived allocations
    sudo ./target/debug/malloc_free -d 60
    # Expected: Significant counts in 30+ minute bucket
  3. Verify unfreed count:

    • Check that total_unfreed_count matches the unfreed allocations
    • Verify it's added to the 30+ minute bucket in display
  4. Compare with oldest_age/avg_age:

    • Ensure histogram aligns with other age metrics
    • Oldest age should be >= 30 minutes if histogram shows unfreed allocations

fixes #39

…cation

The age histogram was incorrectly updating at allocation time (when age=0),
causing all allocations to go into the 0-1min bucket and never being
decremented on free.

Changes:
- BPF: Remove histogram update from update_age_statistics()
- BPF: Remove histogram initialization at first allocation
- BPF: Add histogram update in uprobe_free() to track actual lifetime
- Rust: Add unfreed allocations to 30+ min bucket (conservative estimate)

The histogram now shows:
- Buckets 0-2: Count of allocations freed within those lifetimes
- Bucket 3 (30+ min): Freed allocations + all unfreed allocations

This provides accurate lifetime distribution and highlights long-lived/leaked
memory by conservatively counting unfreed allocations as 30+ minutes old.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@saimizi saimizi merged commit 613bd64 into main Oct 10, 2025
3 checks passed
@saimizi saimizi deleted the fix/age-histogram-lifetime-tracking branch October 10, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Age Histogram Issue in malloc_free

1 participant